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Abstract. We assume that a message may be delivered by packets through multiple hops and investi¬ 
gate the feasibility and efficiency of an implementation of the Omega Failure Detector under such an 
assumption. To motivate the study, we prove that the existence and sustainability of a leader is expo¬ 
nentially more probable in a multi-hop Omega implementation than in a single-hop one. 

An implementation is: message efficient if all but finitely many messages are sent by a single process; 
packet efficient if the number of packets used to transmit a message in all but finitely many messages is 
linear w.r.t the number of processes, packets of different messages may potentially use different chan¬ 
nels, thus the number of used channels is not limited; super packet efficient if the number of channels 
used by packets to transmit all but finitely many messages is linear. 

We present the following results for deterministic algorithms. If reliability and timeliness of one mes¬ 
sage does not correlate with another, i.e., there are no channel reliability properties, then a packet 
efficient implementation of Omega is impossible. If eventually timely and fair-lossy channels are con¬ 
sidered, we establish necessary and sufficient conditions for the existence of a message and packet 
efficient implementation of Omega. We also prove that the eventuality of timeliness of channels makes 
a super packet efficient implementation of Omega impossible. On the constructive side, we present 
and prove correct a deterministic packet efficient implementation of Omega that matches the necessary 
conditions we established. 


1 Introduction 

The asynchronous system model places no assumptions on message propagation delay or 
relative process speeds. This makes the model attractive for distributed algorithm research 
as the results obtained in the model are applicable to an arbitrary network and computer 
architecture. However, the fully asynchronous system model is not well suited for fault 
tolerance studies. An elementary problem of consensus, where processes have to agree 
on a single value, is unsolvable even if only one process may crash |I91 : the asynchrony of 
the model precludes processes from differentiating a crashed and a slow process. 

A failure detector [|6l is a construct that enables the solution to consensus or related 
problems in the asynchronous system model. Potentially, a failure detector may be very 
powerful and, therefore, hide the solution to the problem within its specification. Con¬ 
versely, the weakest failure detector specifies the least amount of synchrony required to 
implement consensus flSl . One such detector is Omegfj^ 

Naturally, a failure detector may not be implemented in the asynchronous model itself. 
Hence, a lot of research is focused on providing the implementation of a detector, espe¬ 
cially Omega, in the least restrictive communication model. These restrictions deal with 

In literature, the detector is usually denoted by the Greek letter. However, we use the letter to denote low complexity 
bound. To avoid confusion, we spell out the name of the failure detector in English. 



timeliness and reliability of message delivery. Aguilera et al. [|TI| provide a remarkable 
Omega implementation whieh requires only a single proeess to have eventually timely 
ehannels to the other proeesses and a single proeess to have so ealled fair-lossy ehannels 
to and from all other proeesses. Aguilera et al. present what they eall an effieient im¬ 
plementation where only a single proeess sends infinitely many messages. In their work, 
Aguilera et al. eonsider a direet ehannel as the sole means of message delivery from one 
proeess to another. In this paper, we eonsider a more general setting where a message 
may arrive to a partieular proeess through several intermediate proeesses. Otherwise, we 
preserve model assumptions of Aguilera et al. 


Our contribution. We study Omega implementation under the assumption that a message 
may eome to its destination through other proeesses. 

To motivate this multi-hop Omega implementation approaeh, we eonsider a fixed 
probability of ehannel timeliness and study the probability of leader existenee in a elas- 
sie single-hop and in multi-hop implementations. We prove that the probability of leader 
existenee tends to zero for single-hop implementations and to one for multi-hop ones as 
network size grows. Moreover, probability of leader persisting while the timeliness of 
ehannel ehanges tends to zero for single-hop and to infinity for multi-hop implementa¬ 
tions. 

If we eonsider deterministie algorithms, we study three elasses of Omega implementa¬ 
tions: message effieient, paeket effieient and super paeket effieient. In a message effieient 
implementation all but finitely many messages are sent by a single proeess. In a paeket 
effieient implementation, the number of paekets in all but finitely many transmitted mes¬ 
sages is linear w.r.t. the number of proeesses in the network. However, in a (simple) paeket 
effieient implementation, paekets of different messages may use different ehannels sueh 
that potentially all ehannels in the system are periodieally used. In a super paeket effieient 
implementation, the number of ehannels used in all but finitely many messages is also 
linear w.r.t. to the number of proeesses. 

Our major results are as follows. If timeliness of one message does not eorrelate with 
the timeliness of another, i.e., there are no timely ehannels, we prove that any imple¬ 
mentation of Omega has to send infinitely many messages whose number of paekets is 
quadratie w.r.t to the number of proeesses in the network. This preeludes a paeket effi¬ 
eient implementation of Omega. If eventually timely and fair-lossy ehannels are allowed, 
we establish the neeessary and suffieient eonditions for the existenee of a paeket effieient 
implementation of Omega. We then prove that this eventuality of timely and ehannels 
preeludes the existenee of a super paeket effieient implementation of Omega. We present 
an algorithm that uses these neeessary eonditions provides a message and paeket effieient 
implementation of Omega 


Related work. The implementation of failure deteetors is a well-researehed area 
Refer to [fT]2|| for detailed eomparisons of work related to the kind of Omega implementa¬ 
tion we are proposing. We are limiting our literature review to the most reeent and elosest 
to ours studies. 
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Delporte-Gallet et al. [[8|| describe algorithms for recognizing timely channel graphs. 
Their algorithms are super packet efficient and may potentially be used to implement 
Omega. However, their solutions assume non-constant size messages and perpetually re¬ 
liable channels. That is Delporte-Gallet et al. deviate from the model of Aguilera et al. 
and the algorithms of Delporte-Gallet et al. do not operate correctly under fair-lossy and 
eventually timely channel assumptions. 

A number of papers consider Omega implementation under various modifications of 
Aguilera et al model. Hutle et al. IfTTlI implement Omega assuming a send-to-all message 
transmission primitive where / processes are guaranteed to receive the message timely. 
Fernandez and Raynal [|2l assume a process that is able to timely deliver its message to 
a quorum of processes over direct channels. This quorum and channels may change with 
each message. A similar rotating set of timely channels is used by Malkhi et al. [fT?l . 
Larrea et al. [fT3]l give an efficient implementation of Omega but assume that all channels 
are eventually timely. In their Omega implementation, Mostefaoui et al. ifTSl rely on a 
particular order of message interleaving rather than on timeliness of messages. Biely and 
Widder Q consider message-driven (i.e., non-timer based) model and provide an efficient 
Omega implementation. 

There are several recent papers on timely solutions to problems related to Omega im¬ 
plementation. Charron-Bost et al. [|71 use a timely spanning tree to solve approximate con¬ 
sensus. Lafuente et al. [fT^ implement eventually perfect failure detector using a timely 
cycle of processes. 

2 Notation and Definitions 

Model specifics. To simplify the presentation, we use an even more general model than 
what is used in Aguilera et al. [|T1. The major differences are as follows. We use infinite 
capacity non-FIFO channels rather than single packet capacity channels. Our channel 
construct makes us explicitly state the packet fairness propagation assumptions that are 
somewhat obscured by the single capacity channels. 

In addition, we do not differentiate between a slow process and a slow channel since 
slow channels may simulate both. Omega implementation code is expressed in terms of 
guarded commands, rather than the usual more procedural description. The operation of 
the algorithm is a computation which is a sequence of these command executions. We 
express timeouts directly in terms of computation steps rather than abstract or concrete 
time. This simplifies reasoning about them. 

Despite the differences, the models are close enough such that all of the results in this 
paper are immediately applicable to the traditional Omega implementation model. 

Processes and computations. A computer network consists of a set N of processes. The 
cardinality of this set is n. Each process has a unique identifier from 0 to n-1. Processes in¬ 
teract by passing messages through non-FIFO unbounded communication channels. Each 
process has a channel to all other processes. That is, the network is fully connected. A 
message is constant size if the data it carries is in D(logn). Eor example, a constant size 
message may carry several process identifiers but not a complete network spanning tree. 


Each process has variables and aetions. The aetion has a guard: a predieate over the 
loeal variables and ineoming ehannels of the proeess. An aetion is enabled if its guard 
evaluates to true. A computation is a potentially infinite sequenee of global network states 
sueh that eaeh subsequent state is obtained by exeeuting an aetion enabled in the previous 
state. This exeeution is a eomputation step. Proeesses may erash. Crashed proeess stops 
exeeuting its aetions. Correct proeess does not erash. 

Messages and packets. We distinguish between a paeket and a message. Message is 
partieular eontent to be distributed to proeesses in the network. Origin is the proeess that 
initiates the message. The identifier of the origin is ineluded in the message. Messages 
are sent via paekets. Packet is a portion of data transmitted over a partieular ehannel. A 
message is the payload of a paeket. A proeess may reeeive a paeket and either forward 
the message it eontains or not. A proeess may not modify it: if a proeess needs to send 
additional information, the proeess may send a separate message. A proeess may forward 
the same message at most onee. In effeet, a message is transmitted to proeesses of the 
network using paekets. A partieular proeess may reeeive a message either direetly from 
the origin, or indireetly possibly through multiple hops. 

Scheduling and fairness. We express proeess synehronization in terms of an adversar¬ 
ial seheduler. The seheduler restrietions are as follows. We do not distinguish slow pro¬ 
eesses and slow paeket propagation. A seheduler may express these phenomena through 
seheduling proeess aetion exeeution in a partieular way. A paeket transmission imme¬ 
diately enables the paeket reeeipt aetion in the reeipient proeess. A paeket is lost if the 
reeeipt aetion is never exeeuted. A paeket is not lost if it is eventually reeeived. 

Timers. Timer is a eonstruet with the following properties. A timer ean be reset, stopped 
and inereased. It ean also be eheeked whether the timer is on or off. It has a timeout 
integer value and a timeout action assoeiated with it. A timer is either a reeeiver timer or 
a sender timer. If a sender timer is on, timeout aetion is exeeuted onee the eomputation 
has at most the timeout integer steps without exeeuting the timer reset. If a receiver timer 
is on, the timeout aetion is exeeuted onee the eomputation has at least the timeout integer 
steps without exeeuting the timer reset. Inereasing the timer, adds an arbitrary positive 
integer value to the timeout integer. An off timer ean be set to on by resetting it. 

Reliable and timely messages and packets. A packet is reliable if it is received. A 
message is reliable if it is received by every correct process. A channel is reliable if every 
packet transmitted over this channel is reliable. 

A channel h fair-lossy if it has the following properties. If there is an infinite number 
of packet transmissions over a particular fair-lossy channel of a particular message type 
and origin, then infinitely many are received. We assume that a fair-lossy channel is not 
type discriminating. That is, if it is fair-lossy for one type and origin, it is also fair-lossy 
for every pair of message type and origin. 

Observe that if there is an infinite number of message transmissions of a particular 
message type and origin over a path that is fair-lossy, then infinitely many succeed. There 



converse is true as well: if there is an infinite number of successful message transmissions, 
there must be a fair-lossy path between the origin an the destination. 

A packet is timely if it is received within a bounded number of computation steps. 
Specifically, there is a finite integer B such that the packet is received within B steps. 
Naturally, a timely packet is a reliable packet. A message is timely if it is received by 
every process via a path of timely packets. A channel is timely if every packet transmitted 
over this channel is timely. A channel is eventually timely if the number of non-timely 
packets it transmits is finite. Note that a channel that transmits a finite number of packets 
is always eventually timely. 

The timely channel definition is relatively clear. The opposite, non-timely channel, is 
a bit more involved. A channel that occasionally delays or misses a few packets is not non- 
timely as the algorithm may just ignore the missed packets with a large enough timeout. 
Hence, the following definition. 

A channel is strongly non-timely if the following holds. If there is an infinite number of 
packet transmissions of a particular type and origin over a particular non-timely channel, 
then, for any fixed integer, there are infinitely many computation segments of this length 
such that none of the packets are delivered inside any of the segments. 

Similarly, the non-timeliness has to be preserved across multiple channels, a message 
may not gain timeliness by finding a parallel timely path, then, for example, the two paths 
may alternate delivering timely messages. Therefore, we add an additional condition for 
non-timeliness. 

All paths between a pair of processes .r and y are strongly non-timely if .r sends an 
infinite number of messages to y, yet regardless of how the message is forwarded or what 
path it takes, for any fixed integer, there are infinitely many computation segments of this 
length such that none of the messages are delivered inside any of the segments. Unless 
otherwise noted, when we discuss non-timely channels and paths, we mean strongly non- 
timely channels and paths. 

Communication models. To make it easier to address the variety of possible commu¬ 
nication restrictions, we define several models. The dependable (channel) model allows 
eventually or perpetually reliable timely or fair-lossy channels. In the dependable model, 
an algorithm may potentially discover the dependable channels by observing packet prop¬ 
agation. The general propagation model does not allow either reliable or timely channels. 
Thus, one message propagation is not related to another message propagation. 

Message propagation graph. Message propagation graph is a directed graph over net¬ 
work processes and channels that determines whether packet propagation over a particular 
channel would be successful. This graph is connected and has a single source: the origin 
process. This concept is a way to reason about scheduling of the packets of a particular 
message. 

Each message has two propagation graphs. In reliable propagation graph R, each edge 
indicates whether the packet is received if transmitted over this channel. In timely propa¬ 
gation graph T each edge indicates whether the packet is timely if transmitted over this 
channel. Since a timely packet is a reliable packet, for the same message, the timely prop- 



agation graph is a subgraph of the reliable propagation graph. In general, a propagation 
graph for each message is unique. That is, even for the same source process, the graphs 
for two messages may differ. This indicates that different messages may take divergent 
routes. 

If a channel from process .r to process y is reliable, then edge {x, y) is present in the 
reliable propagation graph for every message where process x is present. In other words, 
if the message reaches x and x sends it to y, then y receives it. A similar discussion applies 
to a timely channel and corresponding edges in timely propagation graphs. 

Propagation graphs are determined by the scheduler in advance of the message trans¬ 
mission. That is, the recipient process, depending on the algorithm, may or may not for¬ 
ward the received message along a particular outgoing channel. However, if the process 
forwards the message, the presence of an edge in the propagation graph determines the 
success of the message transmission. Note that the process forwards a particular message 
at most once. Hence, the propagation graph captures the complete possible message prop¬ 
agation pattern. A process may crash during message transmission. This crash does not 
alter propagation graphs. 

Proposition 1. A message is reliable only if its reliable propagation graph R is such that 
every correct process is reachable from the origin through non-crashed processes. 

Proposition 2. A message is timely only if its timely propagation graph T is such that 
every correct process is reachable from the origin through non-crashed processes. 


Omega implementation and its efficiency. An algorithm that implements the Omega 
Failure Detector (or just Omega) is such that in a suffix of every computation, each correct 
process outputs the identifier of fhe same correct process. This process is the leader. 

An implementation of Omega is message efficient if the origin of all but finitely many 
messages is a single correct process and all but finitely many messages are constant size. 
An implementation of Omega is packet efficient if all but finitely many messages are 
transmitted using 0{n) packets. 

An omega implementation is super packet efficient if it is packet efficient and the 
packets of all but finitely many messages are using the same channels. In other words, if a 
packet of message mj is forwarded over some channel, then a packet of another message 
m 2 is also forwarded over this channel. The intent of a super packet efficient algorithm is to 
only use a limited number of channels infinitely often. Since a packet efficient algorithm 
uses 0(n) packets infinitely often, a super packet efficient algorithm uses 0{n) channels 
infinitely often. 

3 Probabilistic Properties 

In this section, we contrast a multi-hop implementation of Omega and a classic single¬ 
hop, also called direct channel, implementation. We assume each network channel is 
timely with probability p. The timeliness probability of one channel is independent of 
this probability of any other channel. 



Leader existence probability. We assume that the leader may exist only if there is a 
proeess that has timely paths to all proeesses in the network. In ease of direet ehannel 
implementation, the length of eaeh sueh path must be exaetly one. 

As n grows, Omega implementations behave radically differently. Theorems and 
state the necessary conditions for leader existence and indicate that the probability of 
leader existence for direct channel implementation approaches zero exponentially quickly, 
while this probability for multi-hop implementation approaches one exponentially quickly. 
In practical terms, a multi-hop omega implementation is far more likely to succeed in es¬ 
tablishing the leader. 

Theorem 1. If the probability of each channel to be timely is p < \, then the proba¬ 
bility of leader existence in any direct channel Omega implementation approaches zero 
exponentially fast as n grows. 

Proof: Let be the probability that some process x does not have direct timely chan¬ 

nels to all processes in the network. This probability is P(D;c) = 1 - For two distinct 
processes x and y, and Dy are disjoint since channels are oriented. Thus, if p < 1, the 

probability that no leader exists is PCPlxey Df) = (1 - ^1. □ 

Theorem 2. If the probability of each channel to be timely is p < \, then the probability 
of leader existence in any multi-hop Omega implementation approaches 1 exponentially 
fast as n grows. 

Proof: A channel is bitimely if it is timely in both directions. The probability that there 

exists at least one process such that there exist timely paths from this process to all other 
processes is greater than the probability to reach them through bitimely paths. We use the 
probability of the latter as a lower bound for our result. If p is the probability of a channel 
to be timely, p = p^ is the probability that it is bitimely. Consider graph G where the 
edges represent bitimely channels. It is an Erdos-Renyi graph where an edge exists with 
probability p. It was shown (see UlOU f that P(G is connected) ~ \ - n{\ - pY ^ 1. □ 


Leader stability. As in previous subsection, we assume the leader has timely paths to 
all other processes in the network. If channel timeliness changes, this process may not 
have timely paths to all other processes anymore. Leader stability time is the expected 
number of rounds of such channel timeliness change where a particular process remains 
the leader. 

Again, direct channel and multi-hop implementations of Omega behave differently. 
Direct channel leader stability time approaches zero as n increases and cannot be lim¬ 
ited from below by fixing a particular value of channel timeliness probability. Multi-hop 
leader stability goes to infinity exponentially quickly. In a practical setting, a leader is 
significantly more stable in a multi-hop Omega implementation than in a direct channel 


one. 


Theorem 3. In any direct channel Omega implementation, if the probability of each 
channel to be timely is p < I, leader stability time goes exponentially fast to Q as n 
grows. If leader stability time is to remain above a fixed constant E > Q, then the channel 
timeliness probability p must converge to 1 exponentially fast as n grows. 

Proof: At a given time, a given process has timely channels to all other processes with 

probability The number of rounds X a given process retains timely paths to all other 
processes follows a geometric distribution V{X = r) = q''{l - q), where q = 7 ?"“^ Thus, 
the expected number of rounds a process retains timely channels to all other processes is 
~ which tends exponentially fast toward 0 if p is a constant less than 1 . 
Assume E(A) converges towards a given fixed number E as n tends towards infinity. 
That is, we need lim„_,oo P(G is connected) = Then, p"~^ tends to which implies 
that p converges towards 1 exponentially fast. □ 

Theorem 4. In any multi-hop Omega implementation, if the probability of each channel 
to be timely is p < \, leader stability time goes to infinity exponentially fast as n grows. 
If leader stability time is to remain above a fixed constant E > 0, then channel timeliness 
probability may converge to 0 exponentially fast as n grows. 

Proof: If we fix p, 0 < p < 1, we have P(G is connected) ~ 1 - n(l - p)”“^ (see IfTOll f. 

Then, the expected number of rounds a given process retains timely paths to all other 
processes is asymptotically , which increases exponentially fast. 

Assume E(X) converges towards a given fixed number E as n tends to infinity. This 
means that 

1 

lim P(G is connected) =- = e 

«-»oo E -v \ 

Using well-known results of random graph theory IH, we can take 

Inn c Inn lnln(l-l-£') 
pin) =--r - =- 


4 Necessity and Sufficiency Properties 

We now explore the properties of deterministic Omega implementation. 

Model independent properties. The below Omega implementation properties are appli¬ 
cable to both general propagation and dependable channel model. 

Theorem 5. In an implementation of Omega, at least one correct process needs to send 
infinitely many timely messages. 

Proof: Assume Jl is an implementation of Omega where every correct process sends a 

finite number of timely messages. Start with a network where all but two processes x and 
y crash, wait till all timely messages are sent. Since is an implementation of Omega, 





eventually x and y need to agree on the leader. Let it be x. Sinee all timely messages are 
sent, the remaining messages may be delayed arbitrarily. If x now erashes, proeess y must 
eventually eleet itself the leader. Instead, we delay messages from x to y. The erash and 
the delay are indistinguishable to y so it eleets itself the leader. We now deliver messages 
in an arbitrary manner. Again, sinee ^ implements Omega, x and y should agree on the 
leader. Let it be y. The argument for x is similar. We then delay messages from y to x 
foreing x to seleet itself the leader. We eontinue this proeedure indefinitely. The resultant 
sequenee is a eomputation of Jl. However at least one proeess, either x or y, oseillates in 
its leader seleetion infinitely many times. To put another way, this proeess never eleets the 
leader. This means that, eontrary to the initial assumption, Ji is not an implementation of 
Omega. This proves the theorem. □ 

If single proeess sends an infinite number of messages in a message effieient imple¬ 
mentation of Omega, this proeess must be the leader. Otherwise proeesses are not able to 
reeognize the erash of the leader. Henee, the eorollary of Theorem 

Corollary 1. In a message efficient implementation of Omega, the leader must send in¬ 
finitely many timely messages. 


General propagation model properties. 

Lemma 1. To timely deliver a message in the general propagation model, each recipient 
process needs to send it across every outgoing channel, except for possibly the channels 
leading to the origin and the sender. 

Proof: Assume the opposite. There exists an algorithm that timely delivers message 

m from the origin x to all proeesses in the network sueh that some proeess y reeeives it 
timely yet does not forward it to some proeess z + x. 

Consider the propagation graph T for m to be as follows. 

X ^ y —> z ^ rest of the proeesses 

That is, the timely paths to all proeesses lead from x to y then to z. If IR is sueh that x 
sends m to y, then, by assumption, y does not forward m to z. Therefore, no proeess exeept 
for y gets m through timely paekets. By definition of the timely message, m is not timely 
reeeived by these proeesses. If x does not send m to y, then none of the proeesses reeeive 
a timely message. In either ease, eontrary to the initial assumption, IR does not timely 
deliver m to all proeesses in the network. □ 

The below eorollary follows from Lemma 

Corollary 2. It requires packets to timely deliver a message in the general propa¬ 
gation model. 

Combining Corollary and Theorem we obtain Corollary 


Corollary 3. In the general propagation model, there does not exist a message and packet 
efficient implementation of Omega. 

Proposition 3. There exists a message efficient implementation of Omega in the general 
propagation model where each correct process can send reliable messages to the leader. 

The algorithm that proves the above proposition is a straightforward extension of the 
seeond algorithm in Aguilera et al. [|T] where every proeess re-sends reeeived messages 
to every outgoing ehannel. 

Dependable channel model properties. 

Lemma 2. In any message efficient implementation of Omega, each correct process must 
have a fair-lossy path to the leader. 

Proof: Assume there is a message-effieient implementation IR of Omega where there 

is a eorreet proeess x that does not have a fair-lossy path to the leader. Aeeording to 
Corollary x itself may not be eleeted the leader. Assume there is a eomputation ctj of 
IR where proeess y x is elected the leader. Note that fair-lossy channels are not type 
discriminating. That is, if x does not have a fair-lossy path to y, but has a fair lossy path to 
some other process z, then z does not have a fair-lossy path to y either. Thus, there must 
be a set of processes S <z N such that x e S and y i S that do not have fair-lossy paths to 
processes outside S. 

Since IR is message efficient, processes of S only send a finite number of messages to 
y. Consider another computation 0-2 which shares prefix with (T 2 up to the point were the 
last message from processes of S is received outside of S. After that, all messages from y 
to processes in S and all messages from S to outside are lost. That is in 0-2, y does not have 
timely, or every fair-lossy, paths to processes of 5. It is possible that some other process 
w is capable of timely communication to all processes in the network. However, since 
is efficient, no other processes but y is supposed to send infinitely many messages. 

Since all messages from S are lost, cti and (T 2 are indistinguishable for the correct 
processes outside S. Therefore, they elect y as the leader. However, processes in S receive 
no messages from y. Therefore, they have to elect some other process u to be the leader. 
This means that ,R allows correct processes to output different leaders. That is, ,R is not 
an implementation of Omega. □ 

We define a source to be a process that does not have incoming timely channels. 

Lemma 3. To timely deliver a message in the dependable channel model, each recipi¬ 
ent needs to send it across every outgoing channel to a source, except for possibly the 
channels leading to the origin and the sender. 

The proof of the above lemma is similar to the proof of Lemma Observe that 
Lemma states that the timely delivery of a packet requires n messages per source. If 
the number of sources is proportional to the number of processes in the network, we 
obtain the following corollary. 


Corollary 4. It requires Q{n^) packets to timely deliver a message in the dependable 
channel model where the number of sources is proportional to n. 

Theorem 6. In the dependable channel model, the following conditions are necessary 
and sufficient for the existence of a packet and message efficient implementation of Omega: 

(i) there is at least one process I that has an eventually timely path to every correct process 

(ii) every correct process has a fair-lossy path to 1. 

Proof: We demonstrate suffieieney by presenting, in the next seetion, an algorithm that 

implements Omega in the dependable ehannel model with the eonditions of the theorem. 

We now foeus on proving neeessity. Let us address the first eondition of the theo¬ 
rem. Assume there is a message and paeket effieient implementation of Omega in the 
dependable ehannel model even though no proeess has eventually timely paths to every 
eorreet proeess. Let there be a eomputation of Jl where some proeess x is eleeted the 
leader even though x does not have a timely path to eaeh eorreet proeess. Aeeording to 
Corollary!^ x needs to send infinitely many timely messages. Aeeording to Corollary 
each such message requires Q{n^) packets. That is, IR may not be message and packet ef¬ 
ficient. This proves the first condition of the theorem. The second condition immediately 
follows from Lemma |2l □ 

The below theorem shows that (plain) efficiency is all that can be achieved with the 
necessary conditions of TheoremThat is, even if these conditions are satisfied, supper 
packet efficiency is not possible. 

Theorem 7. There does not exist a message and super packet efficient implementation 
of Omega in the dependable communication model even if there is a process I with an 
eventually timely path to every correct process and every correct process has a fair-lossy 
path to 1. 

Proof: Assume the opposite. Suppose there exists a super packet efficient algorithm 

Jl that implements Omega in the network where some process I has an eventually timely 
path to all correct processes and every correct process has fair-lossy paths to /. 

Without loss of generality, assume the number of processes in the network is even. 
Divide the processes into two sets S i and S 2 such that the cardinality of both sets is njl. 
Refer to Figure for illustration. S \ is completely connected by timely channels. Simi¬ 
larly, 52 is also completely connected by timely channels. The dependability of channels 
between 5 1 and 5 2 is immaterial at this point. 

Consider a computation cri of IR on this network where all processes in 5 1 are cor¬ 
rect and all processes in 5 2 crashed in the beginning of the computation. Since IR is an 
implementation of Omega, one process f 6 5 1 is elected the leader. Since IR is message 
efficient, only Zi sends messages infinitely often. Since IR is super packet efficient, only 
0{n) channels carry theses messages infinitely often. Since the network is completely 
connected, there are (nllf channels leading from 5 1 to 52 - This is in 0{n^). Thus, there 
is least one channel (x,y) such that x 6 5 1 and y e S 2 that does not carry messages from 
/i infinitely often. 
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Fig. 1. Network for cr 3 computation of Theorem|^ 


Let us consider a computation cr2 of where all processes 52 are correct and all 
processes in 5 1 crash in the beginning of the computation. Similar to cr^, there is a process 
I2 & S2 that is elected the leader in cr2, and there is a channel (z, w) such that z & S2 and 
w 6 5 1 that carries only finitely many messages of l2- 

We construct a computation (Ti, of as follows. All processes are correct. Channel 
dependability inside 5 1 and 52 is as described above. All channels between 5 1 and S2 
are completely lossy, i.e., they lose every transmitted message. An exception is channel 
{x, y) that becomes timely as soon as it loses the last message it is supposed to transmit. 
Similarly, channel (z, w) becomes reliable as soon as it loses the last message. 

To construct cr3, we interleave the actions of cti and cr2 in an arbitrary manner. Ob¬ 
serve that to processes in 5 1 computations cti and cr3 are indistinguishable. Similarly, to 
processes in 52, the computations 0-2 and 0-3 are indistinguishable. 

Let us examine the constructed computation closely. Sets 5 1 and 5 2 are completely 
connected by timely channels, and (x,y), connecting 5 1 and 52 is eventually timely. This 
means that /i has an eventually timely path to every correct process in the network. More¬ 
over, due to channel (z, w), every process has a fair-lossy path to h. That is, the conditions 
of the theorem are satisfied. However, the processes of 5 1 elect /j as their leader while the 
processes of 52 elect I2. This means that the processes do not agree on the single leader. 
That is, contrary to the initial assumption, ^ is not an implementation of Omega. The 
theorem follows. □ 

5 AVPO: Message and Packet Efficient Implementation of Omega 

In this section we present an algorithm we call MPO that implements Omega in the fair- 
lossy channel communication model. As per Theorem]^ we assume that there is at least 
one process that has an eventually timely path to every correct process in the network and 
every correct process has a fair-lossy path to this process. 

Algorithm outline. The code of the algorithm is shown in Figure The main idea of 
MPO is for processes to attempt to claim the leadership of the network while discover¬ 
ing the reliability of its channels. Each process weighs each channel by the number of 
messages that fail to come across it. The lighter channel is considered more reliable. If a 
process determines that it has the lightest paths to all processes in the network, the process 
tries to claim leadership of the network. 




constants 

p // process identifier 

NI I set of network process identifiers, cardinality is n 
timers{p^ length is TO 

variables 

leader, initially _L // local leader 
phases[ri\, initially zero // current phase number 
edges[ri\{ri\, initially zero // edge fault weights 
arbs{n\, initially arbitrary // arborescences 
timersyri], initially timers{p^ on, others olf 

length of timers{x^ \ pis arbitrary // timer to send/receive a message 
shout, initially zero // process id to send alive to all neighbors 

actions 

timeout{timers[q]) —> 

if p = q then // own/sender timeout, compute arb rooted in p based on edges 
newArb = arborescence(edges, p) 

newLeader := minWeight{{arbs[r] \r p \ on{timers[r])), newArb)) 
if leader newLeader then // leadership changes 
if newLeader = p then // p gains leadership 
arbs[p^ := newArb 

send startPhase{p,phases{p^, arbs{p^) to Njp 
if leader = p then // p loses leadership 
phases{p^ := phases{p^ + 1 
send stopPhase{p, phases{pY) to Njp 
leader := newLeader 
else // leadership persists 
if leader = p then 

shout := shout + 1 mod n 
if shout p then 

send alive{p, phases{p^, shout) to arbs[p]{p.children) 
else // my turn to shout 

send alive{p, phases[p], shout) to N!p 
reset(timers[p]) // own timer never olf 

else // neighbor timeout/receiver timeout, assume failed, increase, do not reset 
send failed{q,p,arbs[q]{p.parent)) to N/p 
increase{timers[q]) 

receive startPhase{q, phase, arb) for the first time —> 

// if new phase, propagate message, reset timer 
if p q phases{q'\ < phase then 
arbs[q'\ := arb 
phases{q'\ := phase 
send startPhaseiq, phase, arb) to Njp 
reset(timers[q]) 

receive stopPhase{q, phase) for the first time —> 
if p q phase{q^ < phase then 
phases{q'\ := phase 
send stopPhase{q, phase) to Njp 
stop{timers{q'\) 

receive alive{q, phase, sh) for the first time from r —> 
if p q phase{q^ = phase then 

if r = arbs[q]{p.parent) then // received through arborescence 

if sh p then 

send alive(q,phase, sh) to arbs{q\{p.children) 
else // my turn to shout 

send alive{q, phase, sh) to Njp 
reset{timers{q'\) 
else // received from elsewhere 
if ojf {timers[q'\) then 
reset{timers[qY) 

receive failed{q, r, 5 ) for the first time —> 
if p = q then // if p’s alive failed 

edgesys^r^ := edges{s^[r^ + 1 // increase weight of edge from parent 
else 

send failed{q, r, 5 ) to Njp 


Fig. 2. Message and packet efficient implementation of Omega AVPO. 



The leadership is obtained in phases. First, the leader eandidate sends startPhase mes¬ 
sage. Then, the eandidate periodieally sends alive message. In ease an alive fails to reaeh 
one of the proeesses on time, the reeipient replies with failed. The size of startPhase 
depends on the network size. The size of the other message types is eonstant. 

The routes of the messages vary. Messages that are only sent finitely many times 
are broadcast, sent aeross every ehannel in the network. Onee one proeess reeeives sueh 
a message for the first time, the proeess re-sends it along all of its outgoing ehannels. 
Speeifieally, startPhase, stopPhase and failed are broadeast. The leader sends alive in¬ 
finitely often. Henee, for the algorithm to be paeket efheient, alive has to be sent only 
along seleeted ehannels. Message alive is routed through the ehannels that the origin be¬ 
lieves to be the most reliable. 

Speeifieally, alive is routed along the ehannels of a minimum weight arborescence: a 
direeted tree rooted in the origin reaehing all other proeesses. The arboreseenee is eom- 
puted by the origin onee it elaims leadership. It is sent in the startPhase that starts a 
phase. Onee eaeh proeess reeeives the arboreseenee, the proeess stores it in the arbs array 
element for the eorresponding origin. After reeeiving alive from a partieular origin, the 
reeipient eonsults the respeetive arboreseenee and forwards the message to the ehannels 
stated there. 

In addition to routing alive along the arboreseenee, eaeh proeess takes turns sending 
the leader’s alive to all its neighbors. The reason for this is rather subtle: see Theorem]^ 
for details. Due to erashes and message losses, arbs for the leader at various proeesses 
may not reaeh every eorreet proeess. For example, it may lead to a erashed proeess. Thus, 
some proeesses may potentially not reeeive alive and, therefore, not send failed. Sinee 
failed are not sent, the leader may not be able to distinguish sueh a state from a state with 
eorreet arbs. 

To ensure that every proeess reeeives alive, eaeh proeess, in turn, sends alive to its 
every neighbor rather than along most reliable ehannels. Sinee only a single proeess sends 
to all neighbors a partieular alive message, the paeket eomplexity remains 0 {n). 

Message/af/eJ is sent if a proeess does not reeeive a timely alive. This message earries 
the parent of the proeess whieh was supposed to send the alive. That is, the sender of failed 
blames the immediate aneestor in the arboreseenee. Onee the origin of the missing alive, 
receives failed, it inerements the weight of the appropriate edge in edges that stores the 
weights of all ehannels. If a proeess has timely outgoing paths to all proeesses in the 
network, its arboreseenee in edges eonvergenees to these paths. 

Action specifics. The algorithm is organized in five aetions. The first is a timeout aetion, 
the other four are message-reeeipt aetions. 

The timeout aetion handles two types of timers: sender and reeeiver. Proeess p’s own 
timer (q = p) is a sender timer. It is rather involved. This timer is always on sinee the 
proeess resets it after proeessing. First, the proeess eomputes the minimum weight of the 
arboreseenee for eaeh leader eandidate. A proeess is eonsidered a leader eandidate if its 
timer is on. Note that sinee p’s own timer is always on, it is always eonsidered. 

The proeess with the minimum weight arboreseenee is the new leader. If the lead¬ 
ership ehanges (leader newLeader), further seleetion is made. If p gains leadership 


(newLeader = p), then p starts a new phase by updating its own minimum-weight ar- 
borescence and broadcasting startPhase. If p loses leadership, it increments its phase and 
broadcasts stopPhase bearing the new phase number. 

If the leadership persists (leader = newLeader) and p is the leader, it sends alive. 
Process p keeps track of whose turn it is to send alive to all its neighbors in the shout 
variable. The variable’s value rotates among the ids of all processes in the network. 

The neighbor timer (q + p) is a receiver timer. If the process does not get alive on 
time from q, then p sends failed. In case the process sends failed, it also increases the 
timeout value for the timer of q thus attempting to estimate the channel delay. 

For our algorithm, the timer integers are as follows. The sender timer is an arbitrary 
constant integer value T O. This value controls how often alive is sent. It does not affect 
the correctness of the algorithm. Receiver timers initially hold an arbitrary value. The 
timer integer is increased every time there is a timeout. Thus, for an eventually timely 
channel, the process is able to estimate the propagation delay and set the timer integer 
large enough that the timeout does not occur. For untimely channels, the timeout value 
may increase without bound. 

The next four actions are message receipt handling. Note that a single process may 
receive packets carrying the same message multiple times across different paths. However, 
every process handles the message at most once: when it encounters it for the first time. 
Later duplicate packets are discarded. 

The second action is startPhase handling. The process copies the arborescence and 
phase carried by the message, rebroadcasts it and then resets the alive receiver timer 
associated with the origin process. The third action is the receipt of stopPhase which 
causes the recipient to stop the appropriate timer. 

The forth action is alive handling. If alive is the matching phase, it is further con¬ 
sidered. If alive comes through the origin’s arborescence, the receiver sends alive to its 
children in the origin’s arborescence or broadcasts it. The process then resets the timer 
to wait for the next alive. If alive comes from elsewhere, that is, it was the sender’s turn 
to send alive to all its neighbors, then p just resets the timeout and waits for an alive to 
arrive from the proper channel. This forces the process to send failed if alive does not 
arrive from the channel of the arborescence. 

The last action is failed handling. If failed is in response to an alive originated by this 
process (p = q) then the origin process increments the weight of the edge from the parent 
of the reporting process to the process itself according to the message arborescence. If 
failed is not destined to this process, p rebroadcasts it. 

6 AVPO Correctness Proof 

Correctness proof definitions. Throughout this section, I is the identifier of the process 
that has eventually timely paths to all other processes. For simplicity, we assume that I is 
the single such process. Denote B as the maximum number of steps in any timely channel 
propagation delay. Process p is a local leader if leaderp = p, i.e., the process elected itself 
the leader. A process may be a local leader but not the global leader. That is, several pro¬ 
cesses may be local leaders in the same state. Let realArbs(x) for the origin process .r be 



the relation defined by arbs{x\{y.children) at every process y. That is, realArbs{x) is the 
distributed relation that determines how alive messages are routed if they are originated 
by v:. 

Lemma 4 . For any local leader process x and another correct process y such thaty is not 
reachable from x through timely channels over correct processes in realArbs{x), either (i) 
realArbs(x) changes or (ii) x loses leadership, changes phase or receives infinitely many 
failed messages. 

Proof: To prove the lemma, it is sufficient to show that if realArbs(x) does not change 

and X does not lose the leadership or change phase, then x receives infinitely many failed. 

Let 5 be a set of correct processes that are reachable from x through timely channels 
and through correct processes in realArbs{x). Since y is not reachable from x, S N. 
Recall that every process has fair-lossy paths to all processes in the network. Therefore, 
there is such a path from x to y. This means that there is a process z e 5 such that it has a 
fair-lossy channel to w ^ 5 . 

Let us examine process w closer. The network is completely connected. Therefore, 
all other processes from S have channels to w. Note that at least one channel, from z is 
fair-lossy. Moreover, since w does not belong to 5 , if realArbs{x) reaches w, the path to 
w is not timely. 

Since .r is a local leader and does not lose its leadership, it sends infinitely many alive 
messages. Other processes forward these alive along realArbs{x). Also, by the design of 
the algorithm, every process takes turn sending alive to all of its neighbors rather than 
forwarding it along realArbs{x). Let us examine the receipt of these messages by w. 

Process z belongs to S. That is, the path from v: to z in realArbs(x) is timely. This 
means that it receives and sends infinitely many alive originated by x. Since the channel 
from z to w is fair-lossy, infinitely many of these alive are delivered to w. In addition, w 
possibly receives alive from other processes of S. Since, none of these channels are part 
of realArbs{x), when w receives alive from processes in S, it resets the corresponding 
receive timer only when the timer is off. The timer is turned off only when the timeout is 
executed and failed is broadcast. 

The only possible way this receive timer is reset without the timeout action execu¬ 
tion is when w receives alive through realArbs{x). However, the path from v: to w in 
realArbs(x) is not timely. By the definition of non-timely paths, there are infinitely many 
computation segments of arbitrary fixed length where no alive from x is delivered to w. 
This means that, regardless of the timeout variable value at w, the alive messages gener¬ 
ate receiver timeouts. That is, infinitely many timeouts are executed at w. Each timeout 
generates a failed message broadcast by w. Since there are infinitely many broadcasts, 
infinitely many succeed in reaching x. Hence, the lemma. □ 

Lemma 5 . If each process x lis a local leader in infinitely many states then it receives 
infinitely many failed messages. 

Proof: Let x I he a local leader in infinitely many states of a particular computation 

of the algorithm. Once a process assumes local leadership, it may lose it either by (i) in- 



creasing the weight of its minimum weight arboreseenee (ii) by reeording an arboreseenee 
arbs\y] for a proeess y with lower weight than arbs[x]. 

A proeess inereases the weight of its arboreseenee only when it gets di failed message. 
Thus, to prove the lemma we need to eonsider the seeond ease only. 

Sinee jc is a loeal leader in infinitely many states, it must gain loeal leadership baek 
after losing it to another proeess y. By the design of the algorithm, the weight of the ar¬ 
boreseenee of any proeess in arbs may only inerease. This means that onee x gains the 
leadership baek from y, x may not lose it to y again without inereasing the weight of its 
own minimum weight arboreseenee. Thus, either x inereases the weight of its arbores- 
eenee or, eventually, it has the lightest arboreseenee among the leader eandidates. 

In ease x has the lightest arboreseenee, it either beeomes heavier than some other 
leader eandidate’s or x gets infinitely many failed. However, only the latter part of the 
statement needs to be proven sinee x gains leadership infinitely often. 

If X is a loeal leader, it does not send startPhase or stop Phase. Let us eonsider the 
state where all startPhase paekets are delivered. In this ease realArbs{x) does not ehange. 
Sinee x I, even if all eorreet proeesses are reaehable from x in realArbs{x), some links 
in realArbs{x) are not timely. Then, aeeording to Lemma|^ x gets infinitely many failed. 

To summarize, if x 9!= / is a loeal leader in infinitely many states, it reeeives infinitely 
many failed. □ 

Lemma 6 . Process I is a local leader in infinitely many states. 

Proof: Aeeording to Lemma [^either eaeh proeess x I stops gaining loeal leadership 

or the weight of its minimum arboreseenee grows infinitely high. If the latter is the ease, 
X has to gain and lose loeal leadership infinitely often. In this ease, it sends startPhase 
infinitely may times. Message startPhase is broadeast. Sinee every proeess x has fair- 
lossy paths to /, by the definition of fair-lossy paths, infinitely many broadeasts sueeeed. 
This means that the weight of arbs{x\ at I grows without bound. Therefore, if I loses loeal 
leadership, it gains it baek infinitely often. □ 

The below lemma follows immediately from the operation of the algorithm. 

Lemma 7 . The timer length of timers\f\ at every process either stops increasing or it 
reaches TO + B * (n - 1 ) 

And the below lemma follows from the assumption that the leader has an eventually 
timely path to every eorreet proeess. 

Lemma 8 . In every computation, there is a suffix where each broadcast message sent by 
I is timely delivered to every correct process. 

Lemma 9 . An edge leading to process x in a timely path in realArbsif) at I generates 
only finitely many failed. 

Proof: The origin starts every phase with startPhase, then periodieally sends zero or 

more alive and then possibly ends the phase with a stopPhase that earries the phase num¬ 
ber greater than alive and startPhase. 


Messageis generated only when the timer expires at the reeeiving proeess. The 
timer is reset by startPhase and alive. The timer is stopped by stopPhase. 

We prove the lemma by showing that the timer reset by messages of a particular 
phase expires only finitely many times. We start our consideration from the point of the 
computation where the conditions of Lemmas |7] andhold. 

Only alive and startPhase may reset the timeout. Since the conditions of Lemma 
hold, startPhase is delivered within B{n - 1) computation steps to all processes. Message 
alive may be received earlier than startPhase. However, since such alive carries a phase 
number that differs from the number stored at the recipient process, the message is ig¬ 
nored. If alive arrives after startPhase, the reasoning is similar to the case where alive is 
sent after startPhase which is to be considered next. 

Every alive sent after startPhase delivery, travels over the timely path in realArbs{l). 

At most every TO number of steps, either another alive or stopPhase is sent. Since the 
path in realArbsQ) is timely, alive arrives at most after TO + B{n - 1) steps. Due to 
Lemma the same is true of stopPhase. That is, after alive is received, either another 
alive or stopPhase is received within TO + B{n - 1) steps. The receipt of alive resets the 
timeout. The receipt of stopPhase stops it. Due to Lemmathe timer does not expire. 

Moreover, after the receipt of stopPhase, the subsequent alive messages are ignored 
since stopPhase carries a greater phase number. That is, after stopPhase is received, the 
timer is never reset or expires due to the messages of this phase. □ 

Lemma 10. Every untimely edge in realArbs{l) leading to a correct process either gets 
removed or I gets infinitely many failed messages. 

Proof: Due to Lemmaprocess / is a local leader in infinitely many states. Through 

the argument similar to that of Lemma we can show that eventually either I gets failed 
and increases the weight of its minimum arborescence or its minimum arborescence be¬ 
comes the lightest among the leader candidates. Then, I can lose leadership only if it gets 
failed. 

In this case, according to Lemma I receives infinitely many failed messages or ei¬ 
ther loses leadership, changes phase or changes realArbsQ). Observe that I may change 
phase only when it receives failed. It loses leadership only if it gets failed. The change of 
realArbsQ) happens only when I broadcasts startPhase after changing phase and, there¬ 
fore, getting failed. Due to Lemma|^ it gains the leadership back infinitely often. 

That is, in any case, as long as realArbs{l) contains an untimely edge leading to a 
correct process, I gets infinitely many failed. □ 

The below lemma follows from Lemmas l9l and [TO 

Lemma 11. Every computation of MVO contains a suffix where each channel ofrealArbs(l) 
is timely. 

Lemma 12. Every computation of MPO contains a suffix where realArbsil) is the same 
as arbs[l] in process 1. 


Proof: We start our consideration from the point where the conditions of Lemma 

hold. Suppose realArbs{l) and arbs[l] differ for some process x. By the design of the 
algorithm, this may happen only if arbs[l] in process x has an earlier phase than in 1. 
However, since phases differ, alive sent by I are ignored by x. This leads to either x 
sending/az7 to I or claiming leadership. In either case, I sends startPhase. According to 
Lemma[^ this broadcasts succeeds which synchronizes arbs\l\ and realArbsQ). □ 


Theorem 8. Algorithm MPO is a message and packet efficient implementation of Omega 
in the fair-lossy channel model. 


Proof: First, we prove that AVPO implements Omega. Indeed, lemma shows that I 

is a local leader in infinitely many states. Lemmas]^ and 10 show that I gets finitely many 
failed. According to Lemma|^ every process x I either stops being a local leader or gets 
infinitely many failed. This means that at any process the arborescence of I will eventually 
be lighter than any other leader contender. 

According to Lemma I sends infinitely many alive messages along realArbs{l). 
Due to Lemma [T^ realArbsQ) eventually has no untimely channels. Since /, according 
to Lemma [T^ receives only finitely many failed, due to LemmaQ realArbsQ) eventually 
has timely paths from I to every correct process. According to Lemma[^ realArbsQ) and 
arbs\f\ are eventually the same. 

This means that I will be a leader contender in every correct process. Since it has 
the lightest arborescence, it becomes the leader at every correct process. In other words, 
MPO is a correct implementation of Omega. 

By the design of the algorithm, once I has the lightest arborescence and all correct pro¬ 
cesses drop out of leadership contention, I is the only process that sends alive messages. 
By definition, MPO is message efficient. 

The messages are routed along arbs\f\. It is an arborescence. Hence, the number of 
such messages is in 0{n). In addition, each process takes a turn sending alive to its neigh¬ 
bors. This is another 0{n) packets. Therefore, the packet complexity of MPO is in 0{n). 
□ 


7 Algorithm Extensions 

We conclude the paper with several observations about MPO. The algorithm trivially 
works in a non-completely connected network provided that the rest of the assumptions 
used in the algorithm design, such as eventually timely paths from the leader to all correct 
processes, are satisfied. Similarly, the algorithm works correctly if the channel reliability 
and timeliness is origin-related. That is, a channel may be timely for some, not necessarily 
incident, process x, but not for another process y. 

Algorithm MPO may be modified to use only constant-size messages. The only non¬ 
constant size message is startPhase. However, the message type is supposed to be timely. 
So, instead of sending a single large message, the modified MPO may instead send a 
sequence of fixed-size messages with the content to be re-assembled by the receivers. 
If one of the constituent messages does not arrive on time, the whole large message is 
considered lost. 
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